Unrestricted permutation forces extrapolation: variable importance requires at least one more model, or there is no free variable importance

نویسندگان

چکیده

Abstract This paper reviews and advocates against the use of permute-and-predict (PaP) methods for interpreting black box functions. Methods such as variable importance measures proposed random forests, partial dependence plots, individual conditional expectation plots remain popular because they are both model-agnostic depend only on pre-trained model output, making them computationally efficient widely available in software. However, numerous studies have found that these tools can produce diagnostics highly misleading, particularly when there is strong among features. The purpose our work here to (i) review this growing body literature, (ii) provide further demonstrations drawbacks along with a detailed explanation why occur, (iii) advocate alternative involve additional modeling. In particular, we describe how breaking dependencies between features hold-out data places undue emphasis sparse regions feature space by forcing original extrapolate where little no data. We explore effects across various setups find support previous claims literature PaP metrics vastly over-emphasize correlated plots. As an alternative, discuss recommend more direct approaches measuring change performance after muting under investigation.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Inference for Variable Importance

Many statistical problems involve the learning of an importance/effect of a variable for predicting an outcome of interest based on observing a sample of n independent and identically distributed observations on a list of input variables and an outcome. For example, though prediction/machine learning is, in principle, concerned with learning the optimal unknown mapping from input variables to a...

متن کامل

Hierarchical Testing of Variable Importance

A frequently encountered challenge in high-dimensional regression is the detection of relevant variables. Variable selection suffers from instability and the power to detect relevant variables is typically low if predictor variables are highly correlated. When taking the multiplicity of the testing problem into account, the power diminishes even further. To gain power and insight, it can be adv...

متن کامل

Variable Importance Using Decision Trees

Decision trees and random forests are well established models that not only offer good predictive performance, but also provide rich feature importance information. While practitioners often employ variable importance methods that rely on this impurity-based information, these methods remain poorly characterized from a theoretical perspective. We provide novel insights into the performance of t...

متن کامل

Cutoff Threshold of Variable Importance in Projection for Variable Selection

At present, variable selection turns to prominence since it obviously alleviate a trouble of measuring multiple variables per sample. The partial least squares regression (PLS-R) and the score of Variable Importance in Projection (VIP) are combined together for variable selection. The value of VIP score which is greater than 1 is the typical rule for selecting relevant variables. Due to a const...

متن کامل

Model Predictive Path Integral Control using Covariance Variable Importance Sampling

In this paper we present a Model Predictive Path Integral (MPPI) control algorithm that is derived from the path integral control framework and a generalized importance sampling scheme. In order to operate in real time we parallelize the sampling based component of the algorithm and achieve massive speed-up by using a Graphical Processor Unit (GPU). We compare MPPI against traditional model pre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Statistics and Computing

سال: 2021

ISSN: ['0960-3174', '1573-1375']

DOI: https://doi.org/10.1007/s11222-021-10057-z